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Selection  Procedures  For  A 
Problem  in  Analysis  of  Variance* 

Shanti  S.  Gupta  Purdue  University 
and 

Deng- Yuan  Huang  Academia  Sinica,  Taipei 


1 .  Introduction 

For  a  completely  randomized  block  design  with  one  observation  per 
cell,  we  express  the  observable  random  variables  (i  =  l,...,k; 
a  =  1 , . . . ,n)  as 


(1.1) 


$  +  t,  +  e.  ,  y  t.=o, 

a  1  la  .j^i  1 


where  p  is  the  mean-effect,  are  the  block  effects  (nuisance 

parameters  for  the  fixed  effects  model),  t  -rk  are  the  treatment  effects, 
and  €ia  are  the  error  components.  We  assume  that  the  errors  within  each 
block  are  jointly  normally  distributed. 

We  assume  that  the  quality  of  a  treatment  is  judged  by  the  largeness  of 
the  t^s.  A  'population'  tr.  is  called  the  best  if  is  the  largest.  In  general, 
it  may  be  complicated  to  derive  suitable  tests  for  appropriate  hypotheses,  in 
which  the  experimenter  may  really  be  interested.  We  apply  the  subset  selection 
approach  (using  certain  basic  hypotheses)  and  thus  obtain  more  appropriate  infor¬ 
mation  regarding  the  treatments.  A  subset  selection  procedure  is  designed  to 
select  a  subset  so  as  to  include  the  best  population.  Selection  of  any 

subset  that  contains  the  best  is  called  a  correct  selection  (CS). 

Rough ly  speaking,  any  two  populations  that  are  in  the  same  selected  subset, 

will  be  considered  as  "equivalently  good".  If  all  populations  are  selected, 

we  claim  that  all  treatments  are  homogeneous.  In  general,  for  achieving  the 

objective  of  the  experimenter,  one  should  establish  a  suitable  set  of  basic 

hypotheses.  Depending  on  the  objective  one  should  proceed  to  consider  different 
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ways  of  formulating  the  basic  hypotheses.  In  this  paper,  we  discuss  a 
method  based  on  subset  selection  rules  for  the  purpose  of  making  a  claim 
of  the  type:  x.  =  x*  >  x.  +  A  for  all  i  €  I  and  j  €  J,  where  I  and  J  form 

*  J 

a  partition  of  {l,2,...,k}.  The  process  of  making  such  a  claim  will  be 
called  hypothesis  identification.  This  is  achieved  by  setting  up  certain 
basic  hypotheses  regarding  the  x^'s  and  using  a  subset  selection  procedure 
to  test  these  basic  hypotheses.  It  should  be  pointed  out  that  in  identifying 
an  appropriate  hypothesis,  we  assume  that  the  constant  A  in  the  claim  is 

specified  by  the  experimenter,  say,  based  on  past  experience.  Associated  with 
the  tests  of  the  basic  hypotheses  using  a  selection  rule,  there  are  error 

probabilities  and  the  infimum  of  the  probability  of  a  correct  selection  for 
the  rule  employed.  These  are  related  to  the  power  function  of  these  tests. 

The  sum  of  the  average  (over  the  basic  hypotheses  tested)  of  the  error 
probabilities  and  one  minus  the  infimum  of  the  probability  of  a  correct 
selection  is  called  the  identification  risk.  The  main  theorem  of  the 
paper  discusses  the  derivation  of  an  optimal  selection  rule  in  the  sense  of 
minimizing  the  identification  risk.  For  a  more  general  theory  of 
multiple  decisions  from  ranking  and  selection  approach,  one  can  refer  to  a 
recent  monograph  by  Gupta  and  Huang  (1981).  A  general  survey  of  the  entire 
field  is  provided  in  Gupta  and  Panchapakesan  (1979). 

Let  Y  be  a  random  observable  vector  with  probability  distribution  depending 
upon  a  parameter  t'=  (x^  ,. . . .t^)  €  n.  Consider  a  family  of  hypotheses  testing 
problems  as  follows: 


(1.2) 


j€  S2q  vs  H.j :  x  £  Si^,  1  <  i  <  k, 


where  n, 


n  -  (xjx,  =...=  x.}  and  n.  =  (x|x.  >  max  x.},  i  =  l,2,...,k.  A 
u  -  i  k  '  1  j*i  J 


test 


of  the  hypotheses  (1.2)  will  be  defined  to  be  a  vector  (6^ (^) ,. . .  ,6^(^)) , 
where  the  elements  of  the  vector  are  ordinary  test  functions;  when  y  is  observed 


J 
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we  reject  HQ  in  favor  of  Hi  with  probability  A^y),  1  <.  i  ^  k.  The  power 
function  of  a  test  (^,...,6^)  is  defined  to  be  the  vector  ( $-|  (x ) , . . .  ,Bk(x ) ) , 
where  ^(x)  =  ET6.j(Y),  1  <  i  <  k.  For  x  €  ,  B-(x)  is  the  probability  of 

a  correct  selection  P(CS)  and  s^y)  is  the  individual  selection  probability  of 
selecting  the  best  population  ir,..  Let  S ^  be  the  set  of  all  the  tests 
(6-j , . . .  ,6k)  such  that 

(1.3)  Ex<s. (Y)  <  Y,  t  €  q0,  1  <  i  <  k, 

where  y  is  the  upper  bound  on  the  error  probabilities  associated  with  the 
treatment  effects. 

For  each  i,  (1  <  i  <  k),  we  would  like  to  have  b..(x)  large  when  x  €  »*.. 
subject  to  (1.3).  For  x  €  ,  if  we  make  B^(x)  large,  then  3^(x)  should  be 

small  for  j  f  i . 

It  should  be  pointed  out  that  in  the  formulation  and  proof  of  the  optimal 
selection  procedure,  results  from  Neyman-Pearson  theory  are  used. 

2.  Formulation  of  an  Optimal  Selection  Procedure 

Assume  that 

-o  =  (Xla”*',XkcJ’ 

a  =  l,...,n,  are  independently  and  identically  distributed  random  vectors 
with  the  following  distribution: 

(2.1)  (2*a2pkn|Apexp[-  -L  {x  .  eJ’A'^x  -  o)], 

2<j 

where  x'  =  (x^ ,. . .  ,xkl ;  x]n> •  •  •  »*kn)  and  §'  =  (e^ . .  ,ekl ;. . . ;  ein» •  •  •  » 
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A  =  ^ij^knxkn 

-A 

0 

0 


0  ...  ol 


where 


A 


1 


kxk 


We  rewrite  the  original  model  as  the  general  linear  model  as  follows 
X  =  e  +  6  ,  €  *v*  N(0,  c2 A). 

Since  we  are  interested  in  the  difference  between  all  pairs  of  t.'s,  we 
transform  the  linear  model  to  the  following:  For  any  i,  let 

^i  =  c  li  +  2*  n  %  N(0,  0%)> 
where  xl  =  (t. , , . .  •  ,t1|()  ,  =  T.  -  Tj,  j  ^  i. 


1\ 


(Yill . Yikl ; ‘ ‘  ’  Viln’' ' ' ’ Yi kn ^ 1 x ( k- 1  )n 

Yi j«,  =  Xu  "  Xjd’  1  *  j;  i,j  =  1,****k;  *  =  1 
!i  =  Ai  X,  g  =  A.  € 


»n. 


A11  0 
0  \ 


(k-1 )nxkn 


A 


.  0 


L  J''», 


■tV—V 


(Cet'o-'c-e;'  -Jf  [  ;-'J  [v, . v,] 


Hence, 


t  -  (c'z^o^c'e:1  y, 

i 


n 

l  Ym 

4*1  1,A 

1 

1 _ 

“I 

XI 

•  •  •  — 1. 

* - 

IX 

i 

n 

I  *1kA 

L  4=1  J 

-< 

L _ 

where  Xi  =n  i,  V  1  l1  lk- 

The  joint  density  of  Y-ji -j  * •  •  •  ; •  •  •  ;Y^n, . . .  ,Y^n  is  the  following 

=  (27ra2)_sk|£i  J~^  exp[-  (yr  Cx  . ) 1  r'1  (y.  -  C^)] 


where 


^  =  Aj  A  A!  =  (1-a) 


0 

•.  0 


(k-l)nx(k-l)n 


J  = 


(k-l)x(k-l). 


? 


Now,  we  specify  the  fi-'s  as  follows  (Note  that  this  is  a 
specification  from  that  given  earlier): 

q .  =  { r  j  t -  >  max  t •  +  A0 ) ,  1  £  i  i  ^ 

i  -  1  -  j/i  J 

and 

k 

«  =  u  n.. 

i  =  l  1 

Assume  that  a  is  known.  Let 


a!  =  (Ao....,Ao)lx(k_1)i 


i  =  1 . .  ,k,  A  >  0. 


Thus 


p'lyrr  =  exp  ^2  {”^i  '  C-i  ^ ' £i  ^i"C-i^  +  *iEi  ^ 

■  exp{lj  iJfJi’jf,  -  ^JC'eT'c  5,1 

■  whrHbfc1*!!  t"'+  yik’ '  ^5ic'si1c' 


Hence,  we  can  rewrite 


— — v-  >  d'  as 

¥v  " 


yii 


+  . . .+ 


*ik  2- 


d"a 


Let  a  selection  rule  6°  =  (a° . «jj)  be  defined  by 


if  P$  (*f)  1  d'Pgty) 

otherwise 


such  that 

(2.2)  EX6°(Y.)  «  y,  i  €  n0.  Then 


different 


6°  maximizes 
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(2-3)  inf  P(CS | 6 ) 

« 

among  all  selection  rules  6  €  S(y). 

Note  that  6^^.)  is  also  based  on  the  maximum  likelihood  estimators 
li  of  1-j-  Since  for  any  6  €  S(y), 
k 

I  6  n  =  u  n-  implies  x  €  a.  for  some  i,  thus 
i  =  l  1 

P (CS 1 6 )  =  /  <5i(yi)PT(yi)dv(Yi) 

1  min  inf  /  «. (v. )p  (v.  )dv(y. ) . 
l<i<k  x€fi.  I  1  1 


We  have 


inf  P (CS 1 6 )  =  min  inf  /  ^  )PT (y,  )dv(v. ) . 

For  any  6  €  S(y),  it  follows  that 

/  («i-«?)(PA  -  dpg)  <  0 


which  implies 


f  6i  PA.  6iPA.- 

Since  is  nondecreasing  in  ,  hence 

inf  P(CS| 6°)  =  ^min^  /  fijty )pA  )dv(y. ) 

1  min  /  6i(y.)PA  (^i )dv(^. ) 
l<n<k  -i 


-  1 


=  inf  P ( CS j 6 ) . 

een 


We  rewrite  6U  as  follows: 


6%)  = 


if  yu  +...+  yik  >  d"o. 


0 


otherwise 


9 


Thus,  the  optimal  subset  selection  rule  is  as  follows: 


«?(x) 


1  if  x.  > 


0  otherwi se 


x .  +  do , 


where  d  = 


d" 


FT* 


Now,  we  wish  to  determine  d  and  n.  We  make  the  following  transformation 


il 

•*1k-> 


zik  “  ^•••1^lx(k-l) 

T  =  Til+...+ri|c  «  (k-l)T.  -  T.. 


,  and 


Since  the  distribution  of 


r  m  n 


li  * 


Yil 


YikJ 


=  (C'E^O^C'  zT1^ 


1  -A 


is  (2tto2)  2k|rll.|  2exp[ — ^  where  z^  — - 

i  2o 


Then  the  distribution  of  is 


[2™2(l-x)k(k-l)  1]  2  exp[- 


2o  (l-x)k(k-l ) 


(Zik-x)^]- 


Hence 


WV  =  p<zik  i 


(2.4) 

and 


.[-  -££—}  -  T. 


✓(l-A)k(k-iy 
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inf  P  (CS | 6°) 

- 

=  min  /  6°(^)p  (yJdv^J 
1 < i <  k  -i 


=  pA.(zik  i  d"°> 

l<i<k  -l 


=  mi  n 
1  <  i  <  k 


(Z  - ( k-1  )A)/n 

PA  (-~==  — 
-i  /(l-x)k(k-l ) 


(d"-(k-l)A)/n  x 

:  /n^ir* 


(2.5)  =  4-  ]  =  p*. 

/(l-A)k(k-l) 

For  given  r,  P*,  k.  A,  and  A,  we  can  find  d"  and  the  smallest  number  of 
blocks,  n,  to  satisfy  equations  (2.4)  and  (2.5).  Note  that  this  n  is 
also  the  minimum  sample  size  for  the  case  of  one  observation  per  cell  in 
the  completely  randomized  block  design. 

We  rewrite  (2.4)  and  (2.5)  as 


♦[-  , 

A1-x)k 

and 

#[.  (dzAJA(.k_U  ]  =  p*. 

/( 1 -A  )k 

Let  zp*  and  z^  represent  the  upper  percentage  points  corresponding  to 
P*  and  y ,  respectively  of  the  standard  normal  distribution.  Then  we  have 


Z  A 

JL 


ZP*  -  Zy  ’ 


and 


(l-A)k(zp*  -  z  r 

^  "V  2  ^  ’ 

(k-l)AZ  ' 


where  <a>  is  the  smallest  integer  greater  than  or  equal  to  a. 
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Summarizing  the  previous  results,  we  obtain  the  following  theorem. 


Theorem:  Under  model  (1.1)  with  the  stated  assumption  on  €  ,  an  optimal 
procedure  for  selecting  a  subset  of  the  "best"  or  "worthwhile"  treatments 
based  on  the  observed  data  x  and  satisfying  the  conditions  (2.2) 
and  (2.3)  is:  Select  the  population  iri  with  probability  6*?(x) 
given  by 


«}(x)  » 


1  if  1  rrr  I  x,  +  do, 
1  k  1  j*M  J 


0  otherwise 
where  the  smallest  values  of  d  and  n  are  given  by 

Z  A 


d  =  - 


2p*  ‘  ZY 


and 


n 


=  < 


(l-x)k(y  - 

(k-l)A2 


Furthermore,  we  have  established  the  following  connection  between  the 
sf lection  procedure  and  the  hypothesis  identification  problem  as  follows: 

If  ii.  ,  ii.  . n.  (j  -  k)  are  selected,  we  say  that  these  populations  are 

1  1  1  2  1  j 

homogeneous  and  make  the  hypothesis  identification 


H.’ :  t.  =...=  t.  21  max  t  +  Ao. 

1  *1  ’j  l<t<k  1 

H^i]  »•  ••  tij) 

Note  that  the  overall  identification  risk  connected  with  this  problem  is 
1  y  +  (1-P*). 

Remark:  It  should  be  pointed  out  that  for  some  pairs  (y,P*),  6^  may  not  select 

any  population.  This  is  to  be  interpreted  as  not  identifying  any  one  of  the 
appropriate  hypotheses. 
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We  consider  some  special  cases  to  provide  an  idea  as  to  the  appropriate 
identification  of  one  of  the  hypotheses.  For  y  =  0.05, A  =  0.5  and  P*  =  0.95,0.90, 
0.80;  then 
(i)  k  =  2, 

Hq:  T  1  =  T  2  ’  Hj :  ti:Lt2+A0»^2:  t  2  t  -|  +  Aa  . 

In  this  case,  for  specified  A-values,  the  smallest  d  and  n  needed  for  the 
optimal  selection  rule  are  given  in  the  following  table. 


A 

0.1 

0.5 

1 .0 

!  2.0  I 

d(0. 95, 0.90, 0.80) 

0.05,0.06,0.07 

0.25,0.32,0.33 

0.50,0.64,0.66 

n(0. 95, 0.90, 0.80) 

1089,858,620 

44,35,25 

11,9,7 

3,3,2 

(ii)  k  =  3, 


H0: 

T1 

=  t2  =  t3. 

Hi: 

t-j  max(t2,T2)  +  Aa 

H2: 

t2 

max(T-|  ,x ^ )  +  Ao , 

H3: 

T2  >.  max(T-|  ,12)  +  Aa 

H4: 

T1 

=  T2  Lt3  +  Aa, 

H*: 

t  1  =T2^_T2  +  Ao, 

H6: 

t2 

=  t  ^  L  T i  +Aa. 

For  optimal  selection  rule,  the  minimum  value  of  d  and  n  are  computed  (for  specified 
values  of  a)  and  given  in  the  following  table. 


A 

0.1 

0.5 

1.0 

2.0 

d(0. 95,0. 90,0. 80) 

0.05,0.06,0.07 

0.25,0.32,0.33 

0.50,0.64,0.66 

1.00,1.29,1.33 

n(0. 95,0. 90,0.80) 

817,644,465 

33,26,19 

m 

3,2,2 

1  3 


H2: 

t  2  1  max(i ^ + 

Ao  9 

H3: 

i 2  -  max(i^  ,1 2 > ^4 )  +  Ao  , 

H4: 

T4lmax('Il’’l2,T3^  + 

Ar  , 

H5 ' 

1  -j  —  1 2  1"  niax  ( ^  ^  *  *  4 )  Ao  9 

H6 ' 

T1  =  t3  max(T2»T4) 

+  Ao 

,  H7 ; 

1  1  -  1 4  ma  x(t.2>^2^  A  a , 

H8: 

x 2  -  1 2  x(  r  ^  ,  r  ^ ) 

+  Aa 

T2  =  l4  --  mdX( '  ]  * T3)  +  Ao  , 

Hio: 

t  ^  —  t  ^  ^  rna  x  ( t  ^ ,  r  2 

)  +  A. 

3, 

*  1  1  ~  l  2  ~~  ^  2  ^4  ^  AkT  9 

H}2 : 

H14: 

T1  =  t2  =  t4  ^  t3  + 

t2  =  t3  =  c4  -  T1  + 

Ao  , 

Aa. 

H13 

:  ‘1  s  T3  *  t4  :1  +  Aa  ’ 

For  the  optimal  selection  rule,  the  minimum  value  of  d  and  n  are  computed 
(for  specified  values  of  a)  and  given  in  the  following  table. 


A 

0.1 

0.5 

1.0 

2.0 

d(0. 95, 0.90, 0.80) 

0.05,0.06,0.07 

0.25,  0.32,  0.33 

0.50,0.64,0.66 

1 .00,1.29,1.33 

n(0. 95, 0.90, 0.80) 

726,572,413 

30,23,17 

8,6,5 

2,2,2 

Note  that  P*  is  the  probability  of  correct  selection  for  the  associated  subset 
selection  rule,  while  the  error  probability  y  is  controlled  at  5  percent  level.  The 
identification  risk  is  0.05  +  (1-P*).  We  can  explain  the  cases  described  above  as 
follows:  for  k  =  2,  if  the  selected  subset  contains  only,  we  identify 

H.! ,  i  =  1,2;  if  it  contains  and  we  identify  Mg.  For  k  =  3,  if  the 
selected  subset  contains  only,  we  identify  H‘. ,  i  =  1,2,3;  if  it  contains 
7t i  and  i*2>  ar|d  1t3»  or  ^  anc*  ^3  or,iy»  we  identify  H^,  Hj  or  H£ ,  respectively. 
Similar  discussion  applies  to  the  case  k  =  4. 


Now,  we  discuss  the  case  where  a  is  unknown.  For  any  i,  the  maximum 

2 

likelihood  estimators  of  and  o  are: 


=  (C,zT1C)_1C“1zT1  Y.  = 


Yil 


L  Yik-> 


and 


-2 

cr 


~  (k-1  j(n-l )  -i  ^i1  "  silc(c,J:i1c)  Yi- 


^  O  A 

We  know  that  a  and  t..  are  independent  and  the  distribution  f(s)  of  s 


is  /xp(s)  with  p=  (k-1 )(n-l ). 

As  before,  we  define  the  selection  rule  as  follows: 


or 


i  if  y„  ♦...+  y1k  i  d,S. 

0  otherwise  , 


a)  = 


1  if 


+ 


o 


0  otherwise 


Conditionally,  for  an  observed  value  of  5,  we  can  discuss  the  optimality 
as  before.  However,  the  constant  d  and  n  can  be  determined  without  any 


Q  IQ  > 


(2.6) 


/  *  [- 


/d-x)k"(k-Ty 


]  f(s)ds  =  y , 


and 


(2.7) 


inf  P(CS|cpU) 


(d,s-(k-l )A)/n 

=  /  $[ - — —  —  — -  ]f(s)ds 

/(l-x)k(k-l) 


=  P*. 


This  gives 

d-,  /nXn-Ty 

(2.8)  t[-  -  ;  ( k—  1 )  ( n— 1 )  ,0]  =  y , 

/TTTJk 

and 


(2.9) 


t[- 


d-|  /n(n-1 ) 

/n^xTif” 


(k-1 ) (n-1 ) , 


A/nTk-TT  -|  =  p* 
/(l-A)k 


where  t(a;  b,  c)  is  the  percentage  point  of  the  noncentral  t  with  b  degrees 
of  freedom  and  the  noncentrality  parameter  c. 
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